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Exact 2+1 Flavour RHMC Simulations 
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We consider the Rational Hybrid Monte Carlo algorithm for performing exact 2+1 flavour fermion simulations. 
The specific cases of ASQTAD and domain wall fermions are considered. We find that in both cases the naiVe 
performance is similar to conventional hybrid algorithms. 



1. Introduction 

Traditionally, "2+1" flavour simulations have 
been performed using the R algorithm [1], but an 
exact algorithm is clearly desirable. There also 
exist exact Polynomial Hybrid Monte Carlo al- 
gorithms [2,3], which can be used for 2+1 simu- 
lations: however, such algorithms are expensive 
with regard to memory consumption and for the 
case of ASQTAD fermions such an algorithm would 
be impractical due to the very expensive force 
term calculation. We explore the use of the Ra- 
tional Hybrid Monte Carlo (RHMC) algorithm 
applied to ASQTAD and domain wall 2+1 simula- 
tions. 



2. Rational Hybrid Monte Carlo 

The RHMC algorithm [4] is an exact algorithm 
which allows the simulation of theories where the 
fermionic determinant is raised to an non-integer 
power. As with conventional Hybrid Monte Carlo 
(HMC) [5], the determinant is replaced by an in- 
tegral over the exponential of an action contain- 
ing bosonic pseudofermion fields. The non-local 
fermion matrix is replaced by a rational approxi- 
mation, 



Such approximations are cheap to evaluate since 
they can be written in partial fraction form, al- 
lowing evaluation using a multi-shift solver. 

Rational approximations have a great advan- 
tage over polynomial approximations of the same 
degree, because rational approximations typically 
have an error many orders of magnitude smaller. 
It is this important feature, which allows the 
use of a conventional Metropolis acceptance test 
(cost oc y^/"') as opposed to a noisy estimator 
(cost cx T^^). 

When performing RHMC, we must ensure that 
the approximations used for the heatbath and the 
evaluation of the accept/reject Hamiltonian are 
exact (to machine precision) , otherwise we would 
introduce a systematic error into our simulation. 
This requirement is not true for the molecular 
dynamics (MD) evolution, where any error in- 
troduced is corrected for by the acceptance test. 
As this error increases, a decrease in the accep- 
tance probability will be observed. Typically this 
means that we have two approximations when 
performing our simulations, a high order approx- 
imation which is used for the heatbath and ac- 
cept/reject evaluation, and a lower order approx- 
imation used for the MD evolution through phase 
space. 



3. ASQTAD Simulations 

ASQTAD simulations are popular at the mo- 
ment, since they are computationally very cheap 
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compared to the chiral approaches favoured for 
theoretical reasons. As with conventional stag- 
gered fermions, ASQTAD fermions naturally de- 
scribe a theory of four degenerate flavours, and we 
are forced to take the square root of the fermion 
matrix to obtain a theory which we may hope de- 
scribes two degenerate flavours. Similarly, we can 
take a fourth root to obtain a single quark theory, 
which we take to be the strange quark contribu- 
tion. For such a 2-|-l theory, the fermionic action 
reads 

St = tpM{mf = 771)^^/^7/; -I- xM{mf = toJ^-^^'^x, 

where m = (m^ + md)/2. With such an action, we 
can use RHMC as described above with the two 
fields V and %• The eigenspectrum of the ASQ- 
TAD operator is bounded from below by mf, so it 
trivial to ensure that the rational approximation 
encompasses all of the spectrum. 

A complexity does arise when we consider the 
calculation of the force contribution when inte- 
grating Hamilton's equations. When performing 
ASQTAD-type simulations using the R algorithm 
at typical quark masses, the computational cost 
is split roughly equally between the matrix inver- 
sion and the cost of the derivative of the matrix 
with respect to the gauge field. If we were to 
proceed naively using the same formulation, we 
would be evaluating this derivative n + n^ times 
(where n and are the degrees of the MD ra- 
tional approximations for the light and strange 
quark contributions, of 0(10) for typical light 
masses), since each term in the partial fraction 
expansion requires a different field with a diS^er- 
ent shift. This would lead to an algorithm ap- 
proximately ^{n + n^) more expensive than the R 
algorithm. 

The solution lies in how we calculate this 
derivative. The ASQTAD force term is composed 
of terms like U . . .UXXW . . .U, where X = 
M.~^(j). When there are just one or two pseudo- 
fermion fields 0, it is most efficient to compute 
the U . . . UX products first and then evaluate the 
outer product: indeed, this is how the R algo- 
rithm is implemented. However, it is crucial that 
the shift dependence is only present in the vectors 
which appear in the force term. This suggests 
that for the RHMC force term, where there are 



many vectors Xj, if we perform the link matrix 
multiplication first, which is shift independent, 
and then include the outer product for each par- 
tial fraction contribution, we will vastly reduce 
the number of operations performed, i.e., 

n 

Y,{U...UXi ){xju...u) = 

■n 

= u...u(j2^iXl)u...u 

i=l 

The drawback of this approach is that we will 
still be doing more operations than if we were 
performing the R algorithm calculation. It turns 
out that the operations required scale with vol- 
ume V as (782, 424 + 720n)F, where n is the de- 
gree of the approximation, compared to 196, 9201^ 
for the R algorithm. This would imply an ap- 
proximate four-fold overhead, but since the only 
mass dependence in the derivative appears in the 
vectors, we can combine the calculation of the 
(lerivativc for the light and strange contributions, 
thus reducing the overhead by about a factor of 
two. Also, for the 2-1-1 case, the R algorithm re- 
quires that the ASQTAD operator be constructed 
three times within a single MD step (once for each 
heatbath, and once for the inversion), compared 
to only once for inversion with RHMC. Taking all 
of the above into account, we can see that the R 
algorithm scales for 2+1 like 828, 288^, so the R 
algorithm is actually more expensive. 

The current implementation of RHMC for ASQ- 
TAD fermions on the QCDOC is very competitive 
with the R algorithm implementation. Their re- 
spective efficiencies are very similar at around 
36% of peak performance. It is difficult to com- 
pare the two algorithms since the R algorithm is 
an inexact algorithm, and strictly requires an ex- 
trapolation to zero step size. 

4. Domain Wall Simulations 

The case of Furman-Shamir domain wall sim- 
ulations is quite different to ASQTAD type simu- 
lations. We now have a 2-1-1 action as follows 

M{mt = m) y M{mt = mj 
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For the two flavour contribution we can use con- 
ventional HMC evolution, the complication is 
only present for the strange quark contribution. 
We represent the square root which appears in the 
action by a rational approximation and proceed 
using conventional HMC for the degenerate light 
contribution and RHMC for the strange contri- 
bution. Unfortunately, it is not quite so simple 
because of the inclusion of the Pauli-Villars field 
in the numerator in the square root. Since rUf 
does not appear as a multiple of the identity in 
the domain wall fermion matrix, it is not possible 
to write a rational approximation as a function of 
this ratio in terms of shifted matrices. This does 
not prevent evaluation of such an approximation, 
but it does preclude using a multi-shift solver for 
the evaluation, rendering the formulation expen- 
sive. 

The solution to this problem is to split this ra- 
tio into separate fields. The action is then written 
as 

M{m! = m) 

+ (^M{mt = 1)^ 4> + xM{mt = m,)"^X) 

where we have to perform RHMC on the last two 
fields. Although we are now including an extra 
field, with which we have to perform a matrix in- 
version, the additional cost is negligible since the 
mass is O(IOO) times greater than that of the de- 
generate light pair. Indeed, the cost of including 
the effects of the strange quark are small com- 
pared to that of the light pair since it too is rela- 
tively heavy. This cost can be reduced by using a 
Sexton-Weingarten integration scheme [6], using 
a smaller step size for the light quark pair. With 
such an implementation, the overhead of perform- 
ing 2-1-1 over regular 2 flavour is negligible. It 
should be noted that we arc forced to perform 
a matrix inversion to include the bosonic Pauli- 
Villars field — a surprising result. Since the force 
calculation is trivial for domain wall fermions, the 
contribution to the force is performed naively as 
the extra cost is negligible. 

We have implemented this algorithm and have 
tested simple observables to ensure its correct- 
ness. A more systematic study of algorithmic 
performance using domain wall fermions shall be 



forthcoming in a future publication. 

5. Conclusions 

We have demonstrated that RHMC can be used 
to generate gauge configurations efficiently when 
applied to 2+1 ASQTAD simulations. The naive 
cost is similar to the R algorithm for a given step 
size, but the latter requires extrapolation to zero 
step size. 

Initial findings from performing an exact 2-|-l 
domain wall simulation have been presented. 
These initial results indicate that RHMC is per- 
fectly suited for this fermion formulation. 

When performing simulations using RHMC 
we automatically are using multipseudofermions 
leading to a Hasenbusch-type force reduction, 
which can be utilised to allow an increase in the 
step size. This accelerates the performance of 
RHMC over that of the R algorithm and con- 
ventional HMC significantly. The effect of this 
acceleration, and how these 2-1-1 simulations be- 
have on large volumes and small masses, shall be 
investigated in initial runs on the QCDOC. 
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